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Abstract 

In this paper, the problem of fault diagnosis in multiprocessor sys- 
terns is considered under a uniformly probabilistic model in which pro- 
cessors are faulty with probability p. This work focuses on minimizing 
the number of tests that must be conducted in order to correctly diag- 
nose the state of every processor in the system with high probability. 
A diagnosis algorithm that can correctly diagnose the state of every 
processor with probability approaching one in a class of systems per- 
forming slightly greater than a linear number of tests is presented. A 
nearly matching lower bound on the number of tests required to achieve 
correct diagnosis in arbitrary systems is also proven. The number of 
tests required under this probabilistic model is shown to be significantly 
less than under a bounded-size fault set model. Because the number of 
tests that must be conducted is a measure of the diagnosis overhead, 
these results represent a dramatic improvement in the performance of 
system-level diagnosis techniques. 


1 Introduction 

In this paper, the fault diagnosis capabilities of multiprocessor systems in the pres- 
ence of permanently faulty processors are examined. This problem has been well 
studied under the assumption that the number of faulty processors in the system 
is bounded by some value t. It has been shown that nt tests are necessary and 
sufficient to correctly diagnose a system of ri processors in this situation [l]. The 
results of this paper will show that under a probabilistic model in which processors 
are faulty with probability p independently of one another that correct diagnosis 
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0(n • w(n)) tests where w(n) -» oo (arbitrarily slowly) as n -* oo. Thus, in the 
bounded-size fault set model a quadratic number of tests are required to diagnose a 
linear number of faults while under this probabilistic model a linear expected num- 
ber of faults can be diagnosed with high probability using a number of tests growing 

slightly faster than n. 

The problem of multiprocessor system diagnosis in the presence of permanent 
faults has been addressed from a probabilistic viewpoint in several papers [2,3,4]. 
The first paper concerning probabilistic diagnosis [2] examined heterogeneous sys- 
tems in which each processor has an associated probability of failure. The authors 
examined the class of systems known as p-probabilistically diagnosable systems in 
which any fault set that has probability greater than or equal to p of occurring 
is uniquely diagnosable. The problem of determining whether a given system is 
p-probabilistically diagnosable has been shown to be co-NP-complete [3] while an 
0{n 3 ) algorithm has been given [4] for determining the most likely fault set of a 
system in the closely related weighted model. 

In p-probabilistically diagnosable systems, fault sets with probability of occur- 
rence slightly less than p can exist. Hence, the most likely fault set may be only 
slightly more probable than the next most likely fault set, meaning that the proba- 
bility of choosing the wrong fault set may be relatively high. In [5], the author exam- 
ined systems for which the correct fault set can be identified with high probability. 
The model utilized applies to homogeneous systems in which each processor has a 
common probability of failure p. An efficient diagnosis algorithm was presented that 
correctly diagnoses a class of systems containing cnlogn tests, for c > l/(log 1/p), 
with probability approaching one. 

It was also claimed in [5] that this result was the best possible, i.e. that all 
algorithms must have probability approaching zero of achieving correct diagnosis 
in systems containing o(n log n) tests. Unfortunately, due to a subtle flaw in the 
proof, this result is untrue. This result was also used in [6] to prove a similarly 
flawed lower bound in a more general probabilistic model. A counterexample to 
the lower bound in [5] is given in which correct diagnosis is achieved with constant 
probability in a sequence of digraphs containing n - 1 tests. Also in this paper a 
diagnosis algorithm that produces correct diagnosis with probability approaching 
one in digraphs containing slightly more than a linear number of tests is given. 
Finally, a nearly matching lower bound on the number of tests required to achieve 
correct diagnosis with probability approaching one is proven. 
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2 Preliminaries 

The fundamental multiprocessor system model utilized in this paper was proposed 
in [7l In this model a system is represented as a directed graph with vertices of t e 
digraph representing processors in the system and edges of the digraph representing 
tests performed by one processor on another processor. In this section, all basic 
quantities related to this model are defined and methods of diagnosis algorithm 
performance evaluation are examined. 


2.1 Basic Definitions 

For a system composed of n processors, the set of processors will be represented 
hy U = { Uu .. . }Un }. It is assumed that these processors are capable of performing 
tests on one’ another. This situation will be represented by a digraph G{U,E), where 
the vertex set U corresponds to the set of processors of the system and (u, v j G 
if and only if processor u tests processor v in the system. Associated with each 
( u v ) e E is a test outcome. This outcome will be a 1(0) if u evaluates v as faulty 
(fault-free) A complete collection of test outcomes constitutes a syndrome. Below 
syndromes, fault sets, and other fundamental concepts are formally defined. 

Definition 1 For a digraph G(U, E ), a syndrome is a function from E to { 0,1}. 


Definition 2 For a digraph G{U,E), a fault set is a subset of the vertex set U 


Definition 3 For a 
r -1 (u), is given by 


digraph G{U,E) and u S U, the tester set 
r" 1 (u)= {veU :{v,u)eE} 


of u, 


denoted by 


Definition 4 For a digraph G{U,E), a syndrome S, and u€U, the failure set of 
u, denoted by Ai„(u), is given by 

A,„(u) = {v6 r _1 («) S((v,u))=l} 
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2.2 Diagnosis Algorithm Evaluation 

A fundamental problem in multiprocessor systems is to identify the faulty proces- 
sors in a system given a syndrome. An algorithm for this problem .s referred to as 
a diagnosis algorithm. In much of the previous work in the system-level diagnosis 
area diagnosis algorithm evaluation has focused on worst-case performance, n er 
a bounded-size fault set model, correct diagnosis can be guaranteed if the number 
of faulty processors in the system is no greater than some value t < n/2 Since this 
bound can only be satisfied with a given probability, a better measure of diagnosis 
algorithm performance is the probability that it correctly identifies the faulty pro- 
cessors in the system under a probabilistic model for the faults and test outcomes 
in a system. Such a model is presented in this paper. 

A diagnosis algorithm takes a syndrome as input and outputs a subset of the 
processors in the system. This subset contains exactly the processors diagnosed 
as faulty by the algorithm. Thus, for a set of faulty processors and a syndrome 
it is possible to evaluate if the output of a deterministic algorithm is correct by 
comparing the algorithm’s output with the set of faulty processors. Syndrome fault 
set pairs are therefore used as the basic element in the subsequent probabilistic 
analysis of diagnosis algorithm performance. Before proceeding with this analysis 
however, the notion of correct diagnosis must be formally defined. For a syndrome 
S from a digraph G{U, E), and a deterministic algorithm A, let 

Faulty A {S) = (u € U : Algorithm A diagnoses u as faulty when run on 5} 

Thus FaultyAS) represents the output of Algorithm A when run on syndrome S. 
With’this, the diagnosis of an algorithm on a sy ndrome, fault set pair is characterized 

in Definition 5. 

Definition 5 For a syndrome, fault set pair (5, F) from a digraph G{U,E), a de- 
terministic algorithm A is said to produce 

correct diagnosis if and only if Faulty A (S) — F, 
partial diagnosis if and only if Faulty a{$) C F, and 
false alarm diagnosis if and only if Fculty A {S) £ F. 

Note that Definition 5 differs from that used in some previous work where correct 
diagnosis may include faulty processors that are identified as fault-free so long as no 
fault-free processor is identified as faulty. In Definition 5, diagnosis is correct only 
when each fault-free processor is identified as fault-free and each faulty processor is 

identified as faulty. 


4 



3 Probabilistic Model 


In this section, a probabilistic model for the behavior of a multiprocessor system is 
I" iw* model, p.oceeeors felt, with prob.bihty p, feult-fr.e proces- 
sors always produce the correct outcome when performing a test, and no ^sumption 
are made concerning the outcomes of tests performed by faulty processors. It 
be shown in this paper that in contrast to the bounded-size fault set model correc 
diagnosis can be achieved with high probability in this model at relatively low cos . 

For a digraph G(U,E), the sample space of this probability model will consist 
of all syndrome, fault set pairs in that digraph. Formally, 

&G(VE) ~ i( S > F) : F C U and 5 is a function from E to {0, 1}}. 

Since no assumptions have been made concerning the outcomes of teats 

by faulty processors, the probability of a particular syn rome gi 

mav not be specified in this model. The basic events of the model consist of 

of syndrome, fault set pairs which have the same fault set and whose syndromes 

are identical except for the labels on edges out of faulty 

syndrome, fault set pair (S', F') is contained in a basic event B defined as 

B = {(5, F) : F = F' and V(u, v) € E with ueU-F, S((u, «)) = S'{{u, «))} 

Note that there is a unique fault set associated with each basic event but that each 
event may contain many distinct syndrome, fault set pairs. Now, le 

Bg(UE) = {B : B is a basic event of G{U,E)}. 

The family of events T G (U,E) in this probability space is the set of all su 
B g{ue) . For a basic event B from a digraph u{U, E), let 

Ec0 = {(u, v) E E : V(5, F) & B, u ZU — F and v E U — F} and 

E cl = {{u,v)€E:V{S,F)€B, u £ Cf - F and vZF}. 

These sets represent respectively, the set of edges that must be labeled ^ro (fault- 
free processors testing fault-free processors) and the set of edges that must be labeled 
one (fault-free processors testing faulty processors). Given these sets, the probability 
of a basic event B in a digraph G{U, E) is defined as fo ows. 

( 0 if 3(u, v) € E e o s.t. V(5, F) € B, S((u, v)) = 1 or 
Pa(B) = 3(«, v € sX ‘ V ( S ’ F ) € B ’ 5((U,V)) = ° 

" | pl F 'l(l - p) n_ ' F ‘' otherwise 
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where F' represents the unique fault set associated with B. The condition for hi. 
a basic event has zero probability of occurrence is simply a check to make sure 
no fault-free processor produces an incorrect test outcome. Clearly, 

bgs 

and, hence, this is a legitimate probability measure. 

The primary measure of the performance of a diagnosis algorithm used in is 
paper will be the probability that the algorithm produces correct diagnosis as de ne 
in Definition 5. For a digraph G(U, E) and a deterministic algorithm A, let 

Correct.(A) = {{S,F) : Faulty A {S) = F} 

and let NotCorrect.(A) represent the complement of Correct G (A). Thus 
Correct. (A) represents the set of all syndrome, fault set pairs in a digrap 
which Algorithm A produces correct diagnosis. Note that it may be the case th 
Co c in which case P. (Correct. (A)) will not be defined The output 
of I Tabular diagnosis algorithm may depend on the outcomes o 
by faulty processors and thus, the probability of correct diagnosis for the algon 
cannot be determined until a probability distribution on these edges is spec ^ 
For a digraph G(U, E), let P G ' be a probability function defined on D.su^htha 

the family of events is equal to all subsets of H. and VB € Bo, Pa (B) - Fc[ j 
Such a probability function will be referred to as a refinement of Pa- N°w 1 
Pa represent the set of all refinements of P G . Since any type of behavior of the 
faulty processors is allowed in this model, the probability of correc diagnosis for a 
wS algorithm A in a digraph G{U, E), denoted by PCD „(A) . defined to 

be 

PCDg(A) = min P.' (Correct. (A)) = min E F)) 

V Pa'ePo ! ° e ° (5,F)6Correct fJ (A) 

Thus, when calculating the probability of correct diagnosis for an algorithm it is »- 
sumed that the faulty processors perform their tests m the manner most detrimental 
loZ algorithm. gZ a syndrome S, a random algorithm A chooses a 

fault set F with some probability call it pa,s{E) where YLfcu | 

for a digraph G{U, E) and a random diagnosis algorithm A, the probability of correct 

diagnosis for Algorithm A is defined to be 

PCDg(A) = min E Pg'{{ s < B)) ■ Pa,s{F) 

Po'ePo ( 5if ) 6 n,, 
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4 Diagnosis Below n log n Edges 

In [5], a powerful and efficient diagnosis algorithm that achieves correct diagnosis 
with probability approaching one in sequences of digraphs containing cn log n edges, 
for c > l/(log 1/p), was presented. In this section, the question of whether correct 
diagnosis is possible in digraphs containing o(n logn) edges is considered. In partic- 
ular, a sequence of digraphs containing n - 1 edges is exhibited for which a simple 
diagnosis algorithm can achieve correct diagnosis with constant probability. 

Consider a sequence of digraphs G n {U n ,E n ) with U n = u„} and E n 

defined as follows: 

E n = {(ui, u 2 ), («i, u 3 ), . . . , (ui, u„-i), (txi, u n )}, 

i.e. ui tests all other processors. Now, consider the following simple diagnosis 
algorithm. 

Algorithm Naive 

Input: A syndrome S in a digraph G(U, E). 

Output: A set F C U . 

F «- 0 

for each v € {u2, U3, ..., tin} 

if S((ui, «)) = 1 then F «- F U {v} 

Algorithm Naive simply assumes that ui is fault-free and diagnoses a processor 
as faulty if and only if it is failed by 14. Clearly, if 14 is faulty, Algorithm Naive 
incorrectly diagnoses «i itself. If u x is fault-free however, Algorithm Naive produces 
correct diagnosis. Thus, VP G „ f 6 ^G n 

P Gn ' (Correct^ (Naive)) = P G J{{(S, F) : 14 is fault-free}) 

= 1 -P 


and therefore 

PCD G „ (Naive) = 1 - p- 

Thus, this simple diagnosis algorithm produces correct diagnosis with constant prob- 
ability in a sequence of digraphs containing exactly n - 1 edges. 

The digraphs of the given sequence are composed of one processor testing the 
remaining processors. It will be shown that this highly irregular structure whereby 
some processors conduct a large number of tests while others may not conduct 
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any is common to all systems of o(nlo 8 n) edges that can be diagnosed with high 
probability. In Section 6, a class of irregular digraphs possessing a number of ^edges 
growing just faster than n is given for which correct diagnosis can be achieved with 
probability approaching one. In Section 7, it is shown that a linear number of edges 
is required to achieve correct diagnosis with high probability m arbitrary digraphs. 


5 A Simple Majority-Vote Algorithm 

In this section, a simple yet powerful diagnosis algorithm known as Algorithm Ma- 
jority is presented. In Algorithm Majority a processor is diagnosed as faulty if and 
only if it is failed by more than 1/2 the processors in its tester set. 

Algorithm Majority 

Input: A syndrome 5 in a digraph G(U, E). 

Output: A set F C U . 

F <- 0 

for each u G U 

if |A trl (u)| > | r ~^ then F <- F U {u} 


Theorem 1 For a digraph G{U,E), Algorithm Majority has a time complexity of 
0(|£|) and a space complexity of 0(|£|). 

Proof: The failure set cardinalities as well as the tester set cardinalities can be 

calculated in a single traversal of the labeled adjacency lists of the digraph. This 
requires 0(\E\) time. The only storage requirement for the algorithm aside from the 
input and output is temporary variables to hold these values as they are calculated. 
Hence, the space complexity is also [\E\). 

Algorithm Majority is slightly more sophisticated than Algorithm Naive. Rather 
than blindly believing the test outcomes of a single processor, it relies on a majority- 
vote among the processors in the tester set of a given processor. It should be noted 
that for the special class of systems in which one processor tests every other processor 
and no other tests are conducted, Algorithms Naive and Majority are equivalent. 
Intuitively, when p < 1/2 the majority of processors in the system are fault-free an 
Algorithm Majority should correctly diagnose most of the processors in the system. 
In the next section, the performance of Algorithm Majority is considered in detail. 
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6 Performance of Algorithm Majority 

In this section, it is shown that to, a class of irregularly structured systems utilising 
a number ot tests growing just [aster than », Algorithm Naive “rrectly d agnoses 
every processor with probability approaching one. The exact number ot 
quired by Algorithm Majority to achieve a give:, probability ot correct diagnosis 

systems in this class is also examined. 

6.1 Asymptotic Results 

Consider a class of systems in which there is a set of processors known as the testers* 
The systems are such that any processor which is a tester tests all other Pr°«*so 
in the system Any processor that is not a tester conducts no tests. Thus, a (small 
fraction of the processors are relied upon to satisfy all the testing requirements o 
[he system. Such a digraph will be referred to as a tester digraph, « ^ 
below. 

Definition 6 A digraph G(U, E ) is said to be a tester digraph if and only if 
3 T g C U such that 

E= {{u,v):ueT c ,veU, and u^v). 

The set Tg is known as the testing set of G. 

Figure 1 is an example of a tester digraph with 3 testers and 8 vertices. Assume 
that more than 1/2 the testers in a tester digraph are fault-free. ear y, m 
than 1/2 the tests conducted on any processor that is not a tester will be accura i e 
and each such processor will be correctly diagnosed by Algorithm Majority. , 

consider any tester t. If t is faulty, more than 1/2 the processors testing it are 
fault-free and will fail it, meaning that t will be correctly diagnosed by Algonth 
Majority. If t is fault-free, at least l/2 the processors testing it are fault-free and wi 
D Js if Since t is not failed by a majority of its tester set, it will again be correctly 
S^sed by Algorithm Majority. Hence, if more than 1/2 the testers in a e^er 
digraph are fault-free, Algorithm Majority produces correct diagnosis. Theorem 2 
shows that if the number of testers is given by any function that increases 
n this condition will be achieved with probability approaching one and hence th 
probability of correct diagnosis for Algorithm Majority approaches one_ In onier to 
prove this result the following corollary [8] to a theorem proved by Chernoff j9] 

needed. 

• Corollary 1 Let Y be a binomial random variable with parameters n and p. Then 

P{Y < cnp) < e ~ (1-e ) Snp/J , 0 < c < 1 
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Figure 1: A Tester Digraph 


P(Y > cnp ) < e 


-(c-l) 2 np/3^ 


C > 1 


Theorem 2 Let G n {U n ,E n ) be a sequence of tester digraphs on 
testing sets T Gn satisfying \T C J = <*>(") , where. w(n) - c « as n 
then PCD G „ (Majority) - 1 as n — oo. 


n vertices with 
oo. If p < 1/2, 


Proof: Let 

GoodMaj Cn = {{S i F):\T Gn n{U n ~F)\>^^M€E c0 , 

5((u, v)) = 0 and V(u, t>) £ E c i, S ((u, v)) — 1} 


Clearly, 


GoodMaj Gn C Corrects (Majority) 

and therefore, VP Gr / £ ?G„ 

P Gri '(Correct G „ (Majority)) > P G „'(GoodMaj Gn ) 


= 1 - 


V" 
i .. o 


I T Gn 

i 


(i - p)V t ' jJ ‘ 
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• « 


Now, since p < 1/2 


\l£A = c(l-p)\T G J,c<l 


and thus by Corollary 1, 

r * i_ c )i /ji ( 1— p) w ( n ) 

P Gn '(Correct G „ (Majority)) > 1 “ J 


Therefore 


PCDc„ (Majority) 


Thus Algorithm Majority produces correct diagnosis with probability approach- 
i„, Lein a class of digraph, containing a nun, her of edge, given by n • *(»), 

J n) any function that increase, with n. Thi, is an extremely promising result be- 
lie under a bounded-sis. fault set model a quadratic number of tests are requ.md 
^withstand a linear number of fault, while this shows that in thr, probabllist c 
model a linear expected number of faults can be tolerated with a number 
that is arbitrarily close to linear. 

6.2 Concrete Bounds 

In thi, section, the number of tests required to achieve a given probabiHty of cp,,«t 
diagnosis in tester digraphs using Algorithm Majority examined. For 
digraph G(U, E ) with testing set T G 


M 

PCDg (Majority) > 


To I 


t=0 


) (1 - 


pYp 


jrol-i 


( 1 ) 


Note that the probability of correct diagnosis depends only on the testing set cardi- 
nahU and not on n. For a given probability of failure, Inequality 1 can be used to 
determine the number of testers needed for Algorithm Majority to achieve a specific 
probability of correct diagnosis. The size of the testing set required to ach.e 
correct diagnosis probability of 0.99 for various values of p is shown in Table 1. If 
probability of failure of a processor is 0.01, Algorithm Majority can achieve corre 
diagnosis with a probability of 0.99 using a single test per proc«*» regard Uss of 
the number of processors in the system. This corresponds exactly to the exa P 
g ven in Section 4 where a single tester tests every other processor in the systeim 
Hence the total number of tests utilized in this situation is n - 1- For a probability 
of failure of 0.1 the tester set need only be of cardinality 5 for Algorithm Majority 
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p 

\Tg\ 

0.01 

1 

0.05 

3 

0.10 

5 

0.20 

13 

0.30 

31 

0.40 

133 


Table 1: Size of Testing Set Required for 
Correct Diagnosis Probability of 0.99 


n 

p 

Bounded-size 

Probabilistic 

100 

0.01 

400 

99 

100 

0.10 

1800 

495 

100 

0.30 

4100 

3069 

"Tooo 

0.01 

18000 

999 

1000 

0.10 

123000 

4995 

1000 

0.30 

334000 

30969 


Table 2: Total Number of Tests Necessary for Correct Diagnosis 

Probability of 0.99 


to achieve a probability of correct' diagnosis or' 0.99. Thus, when the probability of 
failure is small correct diagnosis can be achieved with high probability using a total 
number of tests that is near n. When p is near 1/2, more tests are necessary, ince 
nearly 1/2 the processors in the system will be faulty in this situation it is to e 
expected that a larger number of tests are required. The important point is that 
the total number of tests remains proportional to n regardless of the value of p. 

These results can be compared with the number of tests required under the 
bounded-size fault set model in the following manner. For a given n and p, determine 
t such that the probability of more than t out of the n processors being faulty is no 
greater than 0.01. Table 2 shows the results of this comparison for various values of 
n and p. For large n and small p the number of tests required under the probabilistic 
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model is dramatically lower than the number required under the bounded-size fa 
set model. For example, when » = 1000 and p = 0.01, the number of tests required 
in the probabilistic model is reduced by a factor of 18 over the bounded-size fault 

set model. 

7 A Lower Bound on the Number of Tests Necessary 
for Correct Diagnosis 

In this section, a lower bound on the number of tests necessary to achieve correct 
diagnosis with high probability is proven. It is shown that if the number o edges in 
an arbitrary sequence of digraphs grows slower than n then all diagnosis algor 
have probability approaching zero of achieving correct diagnosis. This result implies 
that Algorithm Majority achieves a probability approaching one of correct diagn 
on systems that are very nearly as sparse as possible. Thus, this relatively simple 
diagnosis algorithm is indeed extremely powerful. 

When the number of edges in a sequence of digraphs grows slower than n, iso- 
lated processors must exist. Intuitively, no diagnosis algorithm should be capa 
of correctly identifying the state of all these isolated processors with high probabil- 
ity making diagnosis in such situations impossible. This is forma y proven 
Theorem 3. The essence of the proof of Theorem 3 can be explained quite simply. 
To prove that a deterministic diagnosis algorithm A has a proba " ty ^ ppr ° achmg 
zero of achieving correct diagnosis in a sequence of digraphs G„(U n ,E n ), as 
(S F) pairs disjoint from Correct^ (A) must be exhibited that has a probab y 
dominating the probability of Correct^ (A). For a given syndrome with isolated 
vertices, it can be shown that so long as the number of isolated vertices approaches 
infinity, the probability of that syndrome and a fault set with a particular labeling o 
the isolated vertices is dominated by the probability of that syndrome and the fault 
sets in which the isolated processors are relabeled in all possible ways. Thus, any 
(S, F) € Corrects (A), a set of syndrome, fault set pairs disjoint from orrec G : { ) 
can be exhibited that has probability dominating the probability of (5, F). lt«Bh» 
shown that there exists a deterministic diagnosis algorithm that has perfo 
at least as good as the performance of any random algorithm, thus completing 

proof. 

Theorem 3 Let E„) h, a «,ue»ce of digraph, or i n rr'rlic', , mtk 

0 < P < 1 and \En\ e o(n). For any random or deterministic diagnosis algorithm 

A, PCD g „(A) -> 0 as n — oo. 


Proof: 


Assume 3n 0 ,c > 0, and a deterministic algorithm A such that Vn > n 0 , 
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PC D G „(A) > c. This implies that VP Gn ' €E Pa, and Vn > n 0 , 

P Gn ' (Correct^ (A)) > c. 

Now, let ISO Gn C U n represent the set of isolated vertices in G n {U n , E n ). Clearly, 

| ISO Gn \ > n-2\E n \ — oo. 

For a syndrome, fault set pair (S, F) € Correct Gri (A) let 

Relabel (Sif ) = {(S', F 1 ) : S' = S, F' * F, and F - ISO Gn = F' - ISO G J 

and let u 

AllLabel( 5|jF ) = Relabel^) U {{b, t J). 

Thus, Relabel^*) consists of the syndrome, fault set pairs in which the processors 
of ISO Gn are relabeled in all possible ways. Clearly, VP G „ € P G „ 

P Gn '(NotCorrect G „ (A)) 

> Y P G „' (Relabel^, /■)) 

(5,i r )GCorrectc n (A) 

= E [P G „'(AllLabel (Sl F)) -^'((5,^))] 

(5,F)6CorreckG n (A) 

and since all processors in the set ISO Gn are isolated, 

p G <(( 5) jr)) = pl/50o„r\F|^j _ p)l /50 '-n(C/„-F)lp Gn '( A11 L a bel (S| ir ) ). 

Therefore, VP G „' € P Gn 

Y P Gn '{A\\La.be\ s,F)) 

(S,F)€CoTrect(i n {A) 

^ P On'((SjF)J 


> 


(S,F)£ Correct, (A) 
S(5,F )€Correct Gn (/l) ^)) 

[max(p, 1 - p)]^ SOf " l 


p|/50 o -„ ^1(1 - p)|/SO (J „n(U„-F)l 


and thus 


P Gri '(NotCorrect Gn (A)) 

/ 1 


> 


[max(p, 1 - p)] |;50<Jn 


- 1 


Y PGn'US.F)) 

(5.F)6Correct,;„ (A) 
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So, by assumption VP Grl ' € Pg„ and Vn > n 0 

p 0 .'(NotCorr«tc.(A)) > | niax(p> , - 

— » oo 

Thia is clearly a contradiction, implying that for an, deterministic diagnosis algo- 
rithm A and VP Gt ,' g ?G n , 

P Gn '(Correct G „(A)) — + 0. 


Thus, for any algorithm A 

PCD Gn (A)-0 

as well. Now, consider any random diagnosis algorithm A. Then, VP G „' £ P Gn 

PCD G „(A) < Pg„'{{ s >F))-Pa,s{F) 

(s,F)en Cn 

Consider the deterministic algorithm A' that for any syndrome S chooses fault set 

F such that VF' C U n , r , c r, ( u 

P Gn '((S,F))>P G :((S,F)). 

Then, if S represents the set of all syndromes in G n 

pcd g „( a ) < £ PGn'{(s,F^yAs)))-PA,s( F ) 

( s,F)eOa n 

_ . £ Y, PG n '{(S,Faulty Al {S)))-ps,s(F) 

ses FCUn 

= T PaJiiS, Faulty AS))) £ P*M F ) 

SSS F ^ U " 

= P Gn '(Correct Gn (A')) 

-> 0 


This nroof in fact yields a stronger result than is stated in Theorem 3 in the 
with high probability. 
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8 Diagnosis in Regular Systems 

The study of regular systems is important for several reasons. First, in many appli- 

total number of tests, regular systems require the nun, mum overhead under 

me Tdiagnos,s algorithm given in [5] was shown to achieve correct dia^osi, , with 
probabi lty a PP*°“ * c > i/n og i/p). Furthermore, it was proven in [lOj under 

I 

them amenable to diagnosis. 


9 Conclusion 

A uniformly probab.listic fault model for multiprocessor systems in »hich processors 
are faulty wUh probab.lit, P has been stud.ed. It has been show, .that comet 

a probability approaching zero of correct diagnosis. 
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